Mining Top-K Graph Patterns that Jointly Maximize Some Significance Measure
نویسندگان
چکیده
Most of graph pattern mining algorithms focus on finding frequent subgraphs and its compact representations, such as closed frequent subgraphs and maximal frequent subgraphs. However, little attention has been paid to mining graph patterns with user-specified significance measure. In this paper, we study a new problem of mining top-k graph patterns that jointly maximize some significance measure from graph databases. Exploiting entropy and information gain, we give two problem formulations, EM and IGM. We first prove them to be NP-hard and then propose two efficient algorithms, PP-TopK and DM-TopK, to solve them. PP-TopK greedily selects top-k graph patterns among frequent subgraphs. DM-TopK integrates the pruning techniques into the mining framework, and directly mines top-k graph patterns from graph databases. Empirical results demonstrate the quality of our top-k graph patterns, and validate the efficiency and scalability of our algorithms.
منابع مشابه
Pushing Constraints to Generate Top-K Closed Sequential Graph Patterns
In this paper, the problem of finding sequential patterns from graph databases is investigated. Two serious issues dealt in this paper are efficiency and effectiveness of mining algorithm. A huge volume of sequential patterns has been generated out of which most of them are uninteresting. The users have to go through a large number of patterns to find interesting results. In order to improve th...
متن کاملTGP: Mining Top-K Frequent Closed Graph Pattern without Minimum Support
In this paper, we propose a new mining task: mining top-k frequent closed graph patterns without minimum support. Most previous frequent graph pattern mining works require the specification of a minimum support threshold to perform the mining. However it is difficult for users to set a suitable value sometimes. We develop an efficient algorithm, called TGP, to mine patterns without minimum supp...
متن کاملEfficient Mining of Top-k Breaker Emerging Subgraph Patterns from Graph Datasets
This paper introduces a new type of discriminative subgraph pattern called breaker emerging subgraph pattern by introducing three constraints and two new concepts: base and breaker. A breaker emerging subgraph pattern consists of three subpatterns: a constrained emerging subgraph pattern, a set of bases and a set of breakers. An efficient approach is proposed for the discovery of top-k breaker ...
متن کاملMining Top-K Large Structural Patterns in a Massive Network
With ever-growing popularity of social networks, web and bio-networks, mining large frequent patterns from a single huge network has become increasingly important. Yet the existing pattern mining methods cannot offer the efficiency desirable for large pattern discovery. We propose SpiderMine, a novel algorithm to efficiently mine top-K largest frequent patterns from a single massive network wit...
متن کاملMining Statistically Significant Patterns using the Chi-Square Statistic
Statistical significance is used to ascertain whether the outcome of a given experiment can be ascribed to some extraneous factors or is solely due to chance. An observed pattern of events is deemed to be statistically significant if it is unlikely to have occurred due to randomness or chance alone. In the thesis, we study the problem of identifying the statistically relevant patterns in string...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JCP
دوره 5 شماره
صفحات -
تاریخ انتشار 2010